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Sequences obtained in the 5' non-coding region 
(5'NCR) of hepatitis C virus (HCV) were obtained 
from Scottish blood donors and compared with 
previously published HCV sequences. Phylogenetic 
analysis revealed the existence of three distinct groups 
of sequences; two of these corresponded to the recently 
described HCV types 1 and 2 variants, while viral 
sequences detected in around a third of the blood 
donors formed a separate phylogenetic group that 
probably represents infection with a novel virus species. 
Nucleotide sequences of this latter group differed from 
all previously published 5'NCR sequence variants by 
at least 9%. This new virus type also differed 



considerably from previously published variants in 
other regions of the viral genome (core, NS-3 and 
NS-5), with corrected nucleotide distances of 15, 43 
and 49% respectively from the prototype HCV-1 
sequence. Formal phylogenetic analysis of each of the 
coding regions confirmed that HCV type 1 variants 
could be clearly differentiated into regional variants 
(Far East and U.S.A./European), in contrast to the 
clearly overlapping geographical distributions of the 
main HCV types in U.K. blood donors. We discuss the 
evidence for and against the hypothesis that the three 
main phylogenetic groups identified in this study 
represent separate species of HCV. 



Introduction 

The aetiological agent responsible for most cases of post- 
transfusion non-A, non-B (NANB) hepatitis has been 
cloned and characterized (Choo et aL, 1989, 1991). This 
virus, now termed hepatitis C (HCV), is a positive-strand 
RNA virus distantly related to the pestiviruses and 
flavi viruses (Miller & Purcell, 1990; Koonin, 1991). Its 
genome consists of an approximately 332 nucleotide 5' 
non-coding region (5'NCR) followed by a continuous 
single open reading frame encoding a polypeptide of 
around 3010 amino acids and then a short 3' untranslated 
region (Kato et aL, 1990; Takamizawa et aL, 1991 ; Choo 
et aL, 1991). By analogy with flaviviruses, this polypep- 
tide has been divided into a 5' structural region 
consisting of putative core and envelope proteins and a 3' 
region corresponding to non-structural (NS-1 to NS-5) 
proteins. Recombinant proteins cloned from the proto- 
type virus and synthetic peptides based on the viral 
sequence have been used to detect HCV antibodies (Kuo 

The nucleotide sequence data reported in this paper have been 
signed GenBank accession numbers D10113 to D 101 34. 
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etaL, 1989;Muraisoe/a/., 1990;Hosein<tfa/., 1991), and 
screening of blood donors has been initiated in several 
countries to prevent post-transfusional NANB hepatitis. 
However, there remain donations that transmit hepatitis 
C virus but which are seronegative or 'indeterminate' in 
commercial serological tests (Esteban et aL, 1990; van 
der Poel et aL, 1991 ; Japanese Red Cross Non-A, Non-B 
Research Group, 1991). It is possible that some of these 
false negative serological results may be the result of 
infection by extreme sequence variants of HCV that 
elicit an antibody response that has limited or no cross- 
reactivity with the peptide antigens used in serological 
assays. Supporting this hypothesis is the recent discovery 
of HCV variants (Enomoto et aL, 1990; Nakao et aL, 
1991; Okamoto et aL, 1991) that differ markedly in 
sequence from the original prototype HCV (HCV-1; 
Choo et aL, 1991) and others found in Japanese patients 
(Kato et aL, 1990; Takamizawa et aL, 1991). 

To investigate whether sequence heterogeneity might 
influence the effectiveness of serological screening for 
HCV in blood donors, we initiated a study to examine 
nucleotide sequence diversity of HCV in naturally 
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infected individuals. Available for the study were anti- 
HCV positive blood donors, intravenous drug users 
(IVDUs) and haemophiliacs exposed previously to non- 
heat-treated factor VIII, and who had biochemical 
evidence of liver disease (Simmonds et a/., 1990ft). It is 
hoped that such studies will also assist in the develop- 
ment of type-specific and type-common antigens for 
serological diagnosis, to allow the detection and typing of 
HC V variants by polymerase chain reaction (PCR), and 
may ultimately assist in vaccine research. 



Methods 

Samples. Plasma from 18 different blood donors (E-bl to E-bl8), that 
were repeatedly reactive on screening by Abbott second generation 
enzyme immunoassay (EIA), and confirmed or indeterminate by a 
recombinant immunoblot assay (RIBA, Ortho; Chan et al, 1991) were 
the principal samples used in this study. Sequences in the NS-3 region 
from five anti-HCV positive IVDUs (abbreviated as il to i5 in 
Simmonds et aL, 19906), five haemophiliacs who had received non- 
heat-treated clotting concentrate, and who were also anti-HCV- 
positive(hl to h5), three pools ofl 000 donations collected in 1983 (pi to 
p3) and five separate batches of commercially available non-heat- 
treated factor VIII (f 1 to f 5) correspond to those described previously 
(Simmonds et al, 19906). 

Primers. The primers used for cDNA synthesis and PCR are listed in 
Table I . They were synthesized by Oswel DN A Service, Department of 
Chemistry, University of Edinburgh, U.K. 

RNA extraction and PCR. HCV virions in 0-2 to 10 ml volumes of 
plasma were pelleted from plasma by ultracentnfugation at 100000 * 
for 2 h at 4°C. RNA was extracted from the pellet as previously 
described (Chomczynski & Sacchi, 1987; Simmonds et al, 19906). 
First strand cDNA was synthesized from 3 ul of RNA sample at 42 C 
for 30 min with 7 units of avian myeloblastosis virus reverse 
transcriptase (Promega) in 20 ul buffer containing 50 mM-Tns-HCl 
pH 8-0, 5 mM-MgCl 2 , 5 mM-DTT, 50 mM-KCl, 0-05 ug/ul BSA, 15/ D 
DMSO, 600 um each of dATP, dCTP, dGTP and TTP, 1-5 um primer 
and 10 units RNasin (Promega). 
Table 1. Sequences and sources of primers used for amplification of HCV genome 



PCR was performed from 1 ul of the cDN A over 25 cycles witfe- 
consisting of 25 s at 94 °C, 35 s at 50 °C and 2-5 min at 68 *C. ; 
extension time for the last cycle was increased to 9-5 mm. The react;, 
were carried out with 04 units Taq polymerase (Northum 
Biologicals) in 20 ul buffer containing 10 mM-Tris-HCl pH 8-8, 50, 
KC1, 1-5 mM-MgCl 2 , 01%Triton X-100, 33 um each of d ATP, d 
dGTP and dTTP and 0*5 um of each of the outer nested primers. O: 
of the reaction mixture was then transferred to a second 
containingthesamemediumbutwiththeinnerpairofnestedpriii ? 
and a further 25 heat cycles were carried out with the same progranunf 
The PCR products were subjected to electrophoresis in 3%Iow meF^ 
point agarose gel (IBI) and the fragments were detected by cthidir 
bromide staining and u.v. illumination. For sequence analysis/sing 
molecules of cDNA were obtained at a suitable limiting dilution I 
which a Poisson distribution of positive and negative results wa*^ 
obtained (Simmonds et at., 1990a). 

Direct sequencing of PCR products. The PCR products were purified 
by glass-milk extraction (GeneClean; BiolOl). One-quarter of tSp 
purified products was used in sequencing reactions with T7 
polymerase (Sequenase; United States Biologicals) performed 2U*or 
ing to the manufacturer's instructions except that the reactions were^ 
carried out in 10% DMSO and the template DN A was heat-denature^ 
before primer annealing. 

Phylogenetic methods. The sequences were compiled by version 24g 
the programs of Staden (1984) and analysed by programs available -ag 
the University of Wisconsin Genetics Computer Group sequenced 
analysis package, version 7.0 (Devereux et al., 1984). Phylogenetic 
trees were inferred using two different programs available in the , 
PHYLIP package of Felsenstein (version 3.4, June 1991 ; Felsenstcui| 
1988) The program DNAML finds the tree of the highest ukehhood 
(the maximum likelihood tree) given a particular stochastic model of , 
molecular evolution and has been shown to perform well in simulation 
studies (Saitou & Imanishi, 1989). In the analyses performed here the ; 
global (G) option was used as this searches a greater proportion of 
possible trees. The second program used was NEIGHBOR whr; 
clusters (following the algorithm of Saitou & Nei, 1987) a ma - . 
of nucleotide distances previously estimated using the progr* ^ 
DN ADIST (which itself was set, using the D option, to use the same 
stochastic model as underlies DNAML in order to estimate distances 
corrected for the probabilities of multiple substitution). In all cases the 
maximum likelihood and neighbour joining procedures produced 
congruent trees and thus only the former have been presented here. 



Name 


Region 


Position 
of 5' base* 


209 


5'NCR 


8 


211 


5'NCR 


-29 


939 


5'NCR 


-297 


940 


5'NCR 


-279 


410 


Core 


410 


406 


Core 


-21 


288 


NS-3 


4951 


290 


NS-3 


4932 


208 


NS-3 


4662 


207 


NS-3 


4699 


242 


NS-5 


8304 


555 


NS-5 


8227 


243 


NS-5 


7904 


554 


NS-5 


7935 



Polarityt 



Sequences 5' to 3' 



Reference 



+ 
+ 



+ 
+ 



+ 
+ 



ATACTCGAGGTGCACGGTCTACGAGACCT 
CACTCTCGAGCACCCTATCAGGCAGT 
CTGTGAGGAACTACTGTCTT 
TTCACGCAGAAAGCGTCTAG 

ATGTACCCCATG AGGTCGGC 
AGGTCTCGTAGACCGTGCATCATGAGCAC 

CCGGCATGCATGTCATGATGTAT 

GTATTTGGTGACTGGGTGCGTC 

TCTTGAATTTTGGGAGGGCGTCTT 

CATATAGATGCCCACTTCCTATC 

GGCGGAATTCCTGGTCATAGCCTCCGTGAA 

CCACGACTAGATCATCTCCG 
TGGGGATCCCGTATGATACCCGCTGCTTTGA 

CTCAACCGTCACTGAACAGGACAT 



Garson et al (1990) 
Garson et al. (1990) 
Okamoto et al (1990a) 
Okamoto et al. (1990a) 



Simmonds et al. (19906) 
Simmonds etal. (19906) 



Enomoto et al (1990) 
Enomoto et al. (1990) 



* Position of 5' base relative to HCV genomic sequence in Choo et al (1991). 
t Orientation of primer sequence (+, sense; -, antisense). 
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To establish the interrelationships of the major types of HCV, we 
luve separately analysed several regions of the viral genome that differ 
in sequence variability and evolutionary constraint. Thus the con- 
clusions drawn from the sequence comparisons are not subject to 
spurious evolutionary phenomena that may affect a particular region. 
However, one problem with the analysis presented here was the 
absence of a viral sequence that was sufficiently distantly related to 
HCV to serve as an out-group. Thus, although we describe the 
interrelationships of different sequence variants of HCV, it should be 
stressed that we have no means of deciding which sequence is ancestral 
to the others. The trees are thus drawn in the less familiar unrooted 
form to indicate this. All sequences reported in this publication have 
been submitted to GenBank (accession numbers D10113 to D10134). 



Results 

Analysis of the 5' NCR 

Samples were obtained from 18 blood donors that were 
repeatedly reactive in the Abbott second generation EIA 
and which were confirmed or indeterminate in the 
Chiron 4-RIBA (E-bl to E-bl8; Follett et al. 9 1991). 
HCV sequences present in stored plasma samples from 
each donor were amplified with primers corresponding 
to sites in the 5'NCR (Garson et al., 1990; Okamoto et 
al, 1990a) that are well conserved between all known 
HCV variants (Table 2, sequences 1 to 15, 28). 
Sequencing of the PCR product, after limiting dilution to 
isolate single molecules of cDNA before amplification, 
allowed approximately 190 bp in the centre of the region 
to be compared with equivalent published sequences 
(Fig. 1). 



Within the sequences, constant as well as variable 
regions can be found. Six sequences from donors E-bl 3 
to E-M8 closely resembled HCVfl (sequence no. 1) and 
the other published sequences 2 to 15 (Table 2), whereas 
others (E-b9 to E-bl 2) resembled the recently reported 
highly divergent K2 and HC-J6 sequences (nos. 24, 27, 
28). However, eight sequences (E-bl to E-b8) appear 
quite distinct from the others. Division of the sequences 
into three groups is supported by formal phylogenetic 
analysis using the maximum likelihood (Fig. 2) and 
neighbour joining algorithms (data not shown) of the 
blood donor sequences along with previously published 
sequences (identified in Table 2). The group labelled 1 
contains sequences of HCV with a world-wide distri- 
bution (sequences 1 to 15 ; Table 2), and group 2 contains 
K2 and J6 sequences (nos. 24, 27, 28). Sequence 
variability within the three groups is in each case 
considerably less than that which separates them, and no 
sequence intermediate between the three groups was 
found. This tree shows that the third group is equally 
distinct from group 1 as is group 2. Using the DNAML 
model, the corrected distances between sequences within 
each group were in each case less than 3%. Between 
groups, they ranged from 9% (between groups 1 and 3, 
and between groups 1 and 2) to 13% (between groups 2 
and 3) (Table 3). 

Analysis of the NS-5 region 

The nucleotide sequence of the NS-5 region has been 
found to vary significantly between the previously 



Table 2. Source and citation of previously published HCV sequences used in this study 



No. 


Type* 


Abbreviation 


Geographical 
source 


Reference 


1 


KD 


HCV-1 


U.S.A. 


Chooef al (1991) 


2 


KD 


Pt-1 


Japan 


Nakao et al. (1991) 








Enomoto et al. (1990) 




KD 


H77, H90 


U.S.A. 


Ogata et al. (1991) 


5,6 


KD 


GM-1, GM-2 


Germany 


Fuchs et al. (1991) 


7 


l 


Jl 


Japan 


Han et al. (1991) 


g 


l 


Al 


Australia 


Han et al. (1991) 


9 


l 


SI 


S. Africa 


Han et al. (1991) 


10 


1 


Tl 


Taiwan 


Han et al. (1991) 


11 


l 


U 18/124 


U.S.A./Italy 


Han et al. (1991) 


12 


Kll) 


HCV-J 


Japan 


Kato et al. (1990) 


!3 


l (H) 


HCV-BK 


Japan 


Takamizawa et al. (1991) 


14,15 


l (I, II) 


HC-J1.-J4 


Japan 


Okamoto et al. (19906) 


16-20 


1(H) 


Kl, Kl-l-Kl-4 


Japan 


Enomoto et al. (1990) 


21 


KID 


JH 


Japan 


Kubo et al. (1989) 


22 


1 (ID 


J7 


Japan 


Takeuchi et al. (1990) 


23 


Kll) 


T3 


Taiwan 


Chen etal. (1991) 


24-27 


2 (III) 


K2a, K2a-1 


Japan 


Nakao et al. (1991) 




2 (IV) 


K2b, K2b-1 




Enomoto et al (1990) 


28 


2 (III) 


HC-J6 


Japan 


Okamoto et al. (>991) 


29 


2 


Clone A 


Japan 


Tsukiyama-Kohara et al. (1991) 



* Designation of sequences follows the classification described by Enomoto et al (1990). The alternative 
classification (Houghton et al., 1991) is shown in parentheses (see Discussion). 
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-255 



-235 



-215 



-195 



-175 



s-bi 

E-b2 
E-b3 
B-b4 
E-b5 
E-b6 
E-b7 
E-b8 



K2a 

HC-J6 



GGCGTTAGTA 



CGAGTGTCGT 



GCAGCCTCCA 



GGACTCCCCC 



TCCCGGGAGA 



GCCATAGTGG 



TCTGCGGAAC 



CGGTGAGTAC 



ACCGGAATCG 



B-b9 
E-blO 
E-bll 
E-bl2 



K2b-1 



.TA 

.T 



E-bl3 
E-bl4 
E-bl5 
B-bl6 
E-bl7 
E-bl8 



HCV-1 

Pt-1 

H77 

R90 

GMl 

GH2 

Jl 

Al 

SI 

Tl 

018/124 

HCV-J 

HCV-BK 

HC-Jl 

HC-J4 



T 

T 



.CA. .AC. 
.CA. .AC. 
.CA. .AC. 
.CA. .AC. 
.CA. .AC. 
.CA. .AC. 



-155 



-135 



-115 



-95 



-75 



-62 



3 


E-bl 


CGGGTCCTTT 


CTTGGAGCAA 


CCCGCTCAAT 


ACCCAGAAAT 


TTGGGCGTGC 


CCCCGCGAGA 


TCACTAGCCG 


AGTAGTGTTG 


GGTCGCGAAA 


GGCC 


3 


B-b2 






















3 


E-b3 






















3 


E-b4 






















3 


E-b5 






















3 


E-b6 






















3 
3 


E-b7 
E-b8 






















2 
2 


K2a 

HC-J6 
























E-b9 
E-blO 
E-bll 
E-bl2 






















2 






















2 
2 
2 






















2 


K2b-1 
























E-bl3 
E-bl4 
E-bl5 
E-bl6 
B-bl7 
E-bl 8 






































































































































HCV-1 

Pt-1 

H77 

H90 

GMl 

GM2 

Jl 


















































































































TA. . 




G. -TG. .G.C 




A. . . 












Al 

SI 




















TA. . 




G. .TG. .G.C 




A. . . 












Tl 

U18/I24 

HCV-J 

HCV-BK 

HC-Jl 

HC-J4 









































































































Fig. 1. Comparison of nucleotide sequences in the 5'NCR from British blood donors (E-bl to E-bl8) with previously published HCV 
sequences. Dots indicate identity with sequence of E-bl (top line); nucleotide substitutions are indicated. Nucleotide numbering 
corresponds to that described for the prototype HCV-1 sequence (Choo et al, 1991). Source and citation of published sequences are 
shown in Table 2; phylogenetic groups are indicated in the left column. 
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described Kl and K2 variants of HCV (Enomoto et aL, 
1990). To investigate whether the previously identified 
new group of sequences were equally distant from the 
other two in this region as well as in the 5'NCR, we 
compared sequences from four blood donors with group 
3 sequences (E-bl, E-b2, E-b3 and E-b7) and one in 



group 2 (E-bl2) with previously published sequences 
(Fig. 3). As NS-5 is a coding region, the inferred amino 
acid sequences are presented to indicate the degree of 
phenotypic variation between the different viral 
sequences. Phylogenetic analysis of the nucleotide 
sequences is shown in Fig. 4; corrected nucleotide 
distances between the different groups are shown m 
Table 3. 

A remarkable variation was observed between 
sequences of the three groups in this region. Again, 
sequences falling into the third group clustered separ- 
ately from the others in this region. However, unlike the 
5'NCR, there appear to be subdivisions within the other 
groups.Group 1 sequences are split between those found 
in Japanese infected individuals (e.g. HCV-J ; HCV-BK ; 
sequence numbers 12, 13, 16 to 20 in Table 2) and those 
of U.S.A. origin (HCV-1, Pt-1, H77, H90; sequence 
numbers 1 to 4; Fig. 4). There is also some evidence for a 
split between group 2 sequences, with K2a and HC-J6 
(nos. 24, 25 and 28) appearing distinct from type K2b 
sequences (nos. 26, 27) and the Scottish blood donor, 
E-M2. 

Table 3 shows that the average nucleotide distances 
between the two clusters of HCV group 1 sequences is 
25% [indicated here as type la/I (U.S.A.) and type lb/II 
(Japanese)], with variation of only 4 to 7% within each. 
However, this distance is considerably less than those 
which exist between group 1 and group 2 sequences (52 to 
62%), and group 3 (48 to 49%), and the distance between 
group 2 and group 3 sequences (53 to 61%). 



Table 3. Nucleotide distances between the three HCV groups in four regions of the genome 



la (I) 



lb (II) 



2a (III) 



2b (IV) 



5'NCR 


1 

2 
3 


21 
7 
8 


0-0163J 

0*0867 

00948 




00214 
01331 




00123 


Core 


la (I) 
lb (II) 
2a (III) 
3 


6 
5 
1 
1 


0-0227 
0-0855 
0-2226 
0-1511 


0-0359 
0-2051 
0-1802 


0-0000 
0-2188 


ND§ 


0-0000 


NS-3 


la(I) 
lb (II) 
2a (III) 
2b (?IV)|| 
3 


34 

4 

1 

I 

4 


00536 
0-2310 
0-3416 
0-4337 
0-4281 


0-0823 
0-4072 
0-3447 
0*3747 


00000 
0-2776 
0-4885 


0-0000 
0-3552 


00460 


NS-5 


la (I) 
lb (II) 
2a (III) 
2b (IV) 
3 


4 

7 
3 
3 
4 


0-0372 
0-2478 
0-6170 
0-5732 
0-4890 


0-0743 
0-5920 
0-5215 
0-4755 


0-0789 
0-2328 
0-6051 


00655 
0-5300 


00322 



* See footnote for Table 2. 

t G^^^M^Muirnot ^subdivided in the 5'NCR, so distances are presented in an intermediate column. 

§ nd, Not done. . . ,„ A 

I HCV clone A sequence shown as 2b/IV for presentation purposes only (see text). 
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2648 



2668 



2688 



2708 



273f 



3 
3 
3 
3 


B-bl 
E-b2 . 
B-b3 
E-b7 


VTEQDIRVEE 


EIYQCCNLEP 


EARKVISSLT 


ERLYCGGPMF 


NSKGAQCGYR 


RCRASGVLPT 


SFGNTITCYI 


KATAACBAAG 


LRNPb 


















2 
2 

2 

2 


K2a 

HC-J6 

K2a-1 


. . .R. . .T. . 
. . .R. . .T. . 
. . .R. . .T. . 


S. .RA.S.PE 
S. .RA.S.PE 
S. . .A.S.PE 


. . HIA.H. . . 
. .HTA.H. . . 
. . . T A . H . . . 


V 

V 

. . . .V 


QT 

QT 

QT 


T. 

T. 

T. 


.M V 

,M V 

,M V 


. . L. . .K. . . 
. . L. . .K. .. 
. . L. . .K. . . 


rvA.s 

IIA.T 
IVA.T 

IVD.V 


E-bl2 

K2b 
K2b-1 


. . .R. . .T. . 
. . .R. . .T. . 


S. . .A.S.PQ 

S. . .A.S.PQ 
S. . .A.S.PE 


. , . T A . H . . . 
. . .T. . H. . . 


V T 

V. * * .T 


QS 

.... QS ... . 


FT. 

FT. 


,M. . .M 

,M. . .M 


. ,L. . .K. . . 
,.L. . .K. . . 


IVD.I 
IVD.I 


1 
1 
1 
1 

1 
1 


HCV-1 
Pt-1 
H77 
H90 


. . .S. . .T, . 
. . .S. . .T. . 
. . .S. . .T. . 
. . .8. . .T. . 


A D.D. 

A D.D. 

A D.D. 

A D.D. 


Q. .VA.K. . . 
Q. .VA.K. , . 
Q. .VA.K. . . 
Q. . T A . K . . . 


V. . .LT 

.... V ... LT 

V. . .LT 

.... V ... LT 


. .R.EN 

. .R.EN 

. .R.EN 

. .R.EN 


T. 

T. 

T. 

T. 


.C...L 

.C. . .L 

.C. . .L 

.C. . . L 


. ,R. . .R. . . 
. . R. . .R. . . 
. ,R. . .R. . . 
,.R. ..R. 


.QDCT 
. . DCT 
.QDCT 
.QDCT 
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Fig. 3. Comparison of deduced amino acid sequences in the NS-5 region of blood donors E-bl, E-b2 E-b3 E-b7 type '9 
(type 2) with those previously published (Table 2). Amino acid residue numbering follows that of the HCV-1 polyp otein (Choc- * ai 
1991). Dots indicate identity with sequence of E-bl (top line); amino acid substitutions are indicated; phylogenetic grouping in the 
5'NCR (Fig. 2) is indicated in the left column. 




Fig. 4. Phylogenetic analysis of nucleotide sequences in the NS-5 
region using the maximum likelihood algorithm, shown as an unrooted 
tree. Symbols are as described in the legend to Fig. 2. 



Analysis of the NS-3 region 

Amplification reactions were carried out using pre- 
viously published primer sequences in the NS-3 region 
(Weiner et al, 1990), and a pair of empirically derived 
inner primers (Simmonds et ai, 19906). Although these 
primers amplified HCV sequences from a high pro- 
portion of anti-C-100 positive sera from haemophiliacs, 
they were less effective with sera from IVDUs 
(Simmonds et ai, 19906) and with blood donor samples 



(three positive out of 15 tested; data not shown). Two 
conserved sites in the amplified fragment were identified 
by sequence analysis of the NS-3 region from the haemo- 
philiac and IVDU patients, and two new primer 
corresponding to these were specified (207, 208; Tab! 
1). The combination of 288 and 208 (first round) and 2* 
and 207 (second round) primers successfully amplifi 
samples from four donors infected with HCV variants 
from group 3 (Erbl, E-b2, E-b6 and E-b7) but none of 
those infected with group 2 sequences (data not shown). 
This enabled a comparison of sequence group 3 with our 
own (Simmonds et ai, 19906) and previously published 
sequences (Fig. 5, 6; Table 3). For clarity, only seven of 
the group 1 sequences obtained in this and our previous 
study [E-bl6, E-bl7, i3, i4, h5, h3 and hi (nos. 19 to 23, 
filled circles)] are shown in the tree. These sequences are 
representative of the range of variation found in this 
region in individuals infected in Britain; comparison of 
the tree previously published (Simmonds et al y 19906) 
with Fig. 6 shows that the former forms a very small 
component of the overall tree obtained once published 
and group 3 sequences are added. 

The maximum likelihood tree again shows three main 
groups of sequences. As was found in the NS-5 region, 
sequences in the group 1 5'NCR are split into geo- 
graphical groups in NS-3. Sequences of Japanese and 
Taiwanese origin (HCV-J, HCV-BK, JH and T3, nos. 
12, 13, 21, 23; Table 2) are distinct from the HCV-1 
sequence (no. 1), and those found in Scottish blood 
donors (E-M6, E-M7, pi to p3), IVDUs (il to i5) and 
haemophiliacs (hi to h5) (Fig. 5). However, variation 
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Fig 6 Phylogenetic analysis of nucleotide sequences in the NS-3 

JL Representative nucleotide sequences of the Jive groups ^of type 1 
sequences shown in Fig. 5 are coded as follows: • .3; V • W. 
1 1>3; • hi. Symbols are as described in the legend to F lg . 2. 

within group 1 (23%) is lower than those which exist 
between group 1 and the four group 3 sequences (37 to 
43%), and between the group 2 sequence, HC-Jb, no. a 
(34 to 41%). As reported previously (Simmonds et a., 



19906), the majority ofnucleot.de substitution that ex*t 
between type 1 sequences are silent (i.e. do not affect the 
encoded amino acid sequence), but numerous ammo ac d 
substitutions exist between sequences in the three mam 
.roups (Fig. 5). The analysis of the NS-3 region includes 
^ sequence of clone A, labelled as no. 29 (Tsukiyama- 
Kohara et al, 1991), which was obtained from Japanese 
patients with NANB hepatitis, and which was reported 
to be distinct from existing HCV 
this sequence appears to cluster with HC-J6 (no. 28 , u /„ 

1 and 3 sequences). Although sequence data for clone A 
Le restricted, it is possible that it may be homologou 
with K2b/E-bl2, as the sequence 
and HC-J6 is comparable to that between K2b/E-bl2 
and HC-J6 in the NS-5 region (Table 3). 

Partial sequence of the putative core region of HCV 
The region encoding the putative core protein of HCV is 
comparatively well conserved in its nucleoUde^qu^e 
between previously published sequences. Sequences 
classified in the 5'NCR as group 
amino acid sequence similarities of 90 to 98 /„ and .*n «> 
99% respectively (Fuchs et al, 1991 ; Ogata et al ^ 
whereas 81% (nucleotide) and 90% (ammo acid) sum- 
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Fig. 7. Comparison of deduced amino acid sequences in the HCV core region of blood donor E-bl (type 3) with those previously *CJ 
published (Table 2). Numbering, symbols and abbreviations are as described in the legend to Fig. 3. 



larities are reported between HC-J6 (group 2) and 
HCV-1 (group 1 ; Okamoto et al. t 1991). The core region 
from the blood donor E-bl (group 3), amplified with 
primers 410 and 406, was found to be distinct from both 
group 1 and group 2 (Fig. 7, 8; Table 3). Again there was 
a prominent subdivision of group 1 sequences into 
Japanese (HCV-J, HCV-BK, HC-J4, JH and J7; 
sequences 12, 13, 15, 21, 22) and U.S.A./European 
(HCV-1, H77, H90, GMl, GM2; nos. 1, 3 to 6, 14) 
sequences. As was found in NS-3, very little amino acid 
sequence variation is found between geographically 
separated variants of group 1 in the core regions; almost 
all of the nucleotide differences between the two groups 
are at 'silent' sites. By contrast, HC-J6 shows at least 10 
and the E-bl (group- 3) sequence shows at least eight 
amino acid substitutions in comparison with group 1 
sequences. 




Fig. 8. Phylogenetic analysis of nucleotide sequences in the core region 
using the maximum likelihood algorithm, shown as an unrooted tree. 
Symbols are as described in the legend to Fig. 2. 
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Discussion 

A major difficulty associated with the use of the PCR to 
assess sequence variation is the possibility that mis- 
matches between the primers and the variant sequence 
will prevent amplification. In this study, we have used 
several strategies to overcome this problem. For initial 
virus detection, we used primers in the 5'NCR, which are 
reported to be highly conserved amongst previously 
published sequences (sequences 1 to 15, 28; Table 2). 
Analysis of the blood donor samples revealed the 
existence of three phylogenetically distinct groups of 
sequences. Sequences in the third group have not been 
previously described. Based on this grouping, we sought 
corroboration of our findings in other (coding) and more 
variable regions of the viral genome. 

Analysis of the NS-5 region, which was based on 
several sequences of each of the three main variants (Fig. 
3, 4; Table 3), revealed that group 3 sequences formed a 
relatively homogeneous group that was quite distinct 
from published sequences. Group 1 and 2 sequences that 
appeared homogeneous in the 5'NCR split into two 
distinct clusters. The proposed subdivision of K2 
sequences (group 2; Enomoto et al, 1990) into K2a and 
IC2b types is also supported by the phylogenetic analysis 
presented in this paper. We observed that one of the 
blood donor sequences obtained in this study appears 
most similar to K2b, and HC-J6 (Okamoto et al, 1991) 
falls into the K2a group. Differentiation of HCV type 1 
sequences into two groups is also clearly shown in the 
core (Fig. 7) and NS-3 regions (Fig. 5), in both cases with 
the group 2 and 3 sequences appearing considerably 
more distant. 

This issue of whether the distinct phylogenetic 
groupings of HCV constitute separate species cannot be 
dearly resolved at this stage, as the criteria by which 
HCV variants may be so differentiated have not yet 
been defined. Circumstantial evidence in favour of the 
distinct species hypothesis includes the very marked 
clustering of phylogenetically distinct groups, without 
intermediate type sequences. Secondly, we and others 
have found an overlapping distribution in a single 
geographical area, and indeed all three groups appear 
capable of mixed infection in multiply exposed individ- 
uals, such as haemophiliacs (Enomoto et al, 1990; 
Nakao et al., 1991; Chan et al., 1991; Pozzato et al., 
1991). Such findings are quite contrary to the alternative 
hypothesis that the different groups represent epidemio- 
togically separated and hence divergent variants of the 
same species. 

Unlike the 5'NCR, where there are only three distinct 
groups, each of the coding regions shows prominent 
differentiation of group 1 sequences into two separate 
Asters. In this case, the sequences in the two subgroups 



are geographically separated between Far East (Japan, 
Taiwan) and U.S.A./European variants (Table 2). 
However, an exception to this is the HC-J1 sequence (no. 
14; Okamoto et al, 19906), which clusters with the 
U.S.A./European group in the core region (Fig. 8) but 
was obtained from a Japanese patient with NANB 
hepatitis. The Pt-1 sequence (no. 2) that also clusters 
away from the Far East group was obtained from a 
Japanese haemophiliac treated with imported factor 
VIII of U.S. A. origin (Enomoto et al, 1990 ; Nakao et al, 
1991). There are insufficient sequence data to indicate 
whether the two proposed type 2 subtypes, K2a and K2b 
(Enomoto etal, 1990; Nakao etal, 1991), also represent 
geographically distinct variants. 

There have been several attempts to classify HCV 
sequence variants into different types. Enomoto et al. 
(1990) classified sequences corresponding to the phylo- 
genetic groups labelled 1 and 2 in this paper as Kl/PT 
and K2, the first being subdivided along the geographi- 
cal lines discussed previously, and K2 comprising the 
K2a and K2b subgroups. Houghton et al (1991) 
classified sequences as type I (sequences of U.S.A./ 
European origin in our phylogenetic group 1), type II 
(Far East group 1 sequences) and type III (sequences in 
our phylogenetic group 2), presumably with the option of 
extending the classification to type IV to differentiate the 
two clusters of group 2 sequences. However, this 
classification is uneven in that the difference between 
type I and II sequences in each protein coding region is 
consistently less than that between I and III and between 
I and the new group 3 sequences reported here (Table 3). 
It is also particularly noticeable that far fewer amino acid 
substitutions exist between type I and II sequences than 
between other variants, and type I and II sequences 
cannot be differentiated in the 5'NCR. For these 
reasons, we would favour classifying our new sequences 
in group 3 as type 3, although if the alternative system is 
accepted, we propose that they be provisionally termed 
type V. 

The genomic organization of HCV corresponds to that 
of flaviviruses and pestiviruses, with a single open 
reading frame encoding a polyprotein that is sub- 
sequently cleaved into structural and non-structural 
proteins. Although the overall degree of sequence 
dissimilarity between the three main groups cannot be 
measured by comparison of the small regions of sequence 
analysed in this study, a rough estimate of the extent of 
divergence in protein coding regions is given by an 
examination of the divergence of the partial core 
sequences obtained here. This shows that the difference 
between HCV group 1, 2 and 3 core regions (approxi- 
mately 10% amino acid sequence divergence) is compar- 
able to that which exists between different serotypes of 
the flavivirus, tick-borne encephalitis virus (14%; Mandl 
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et aL, 1988), but lower than that which is found between 
serotypes of a mosquito-borne flavivirus, dengue fever 
virus (67%), and the West Nile virus (WNV) subgroup 
(60%). The 5'NCR sequences of the different members 
of the WNV subgroup are also considerably more diverse 
( < 50% similarity) than those of the three major types of 
HCV, although within each of the members, e.g. Murray 
Valley encephalitis virus, the 5'NCR is extremely well 
conserved (>95% similarity; Coelen & Mackenzie, 
1990). On the basis of these analogies, we speculate that 
the major types of HCV could conceivably represent 
distinct serotypes, each capable of human infection 
irrespective of the immune response mounted against 
other HCV types. 

The existence of different hepatitis C viral types opens 
up the possibility that the distinct clinical disease 
syndromes associated with HCV infection may reflect 
underlying differences in the pathogenicity of the 
different types of the virus. There is some evidence that 
infection with HCV group 2 variants (type III) leads to 
more severe disease than group 1, and is less susceptible 
to interferon treatment (Pozzato et al, 1991). There are, 
as yet, no data to link virus type with different sources of 
infection (particularly non-parenteral infection). Our 
own preliminary investigations have shown that infec- 
tion with HCV type 3 is more strongly associated with 
previous intravenous drug misuse than types 1 and 2 
(unpublished data). 

The degree of sequence variability found between 
HCV types would be expected to affect profoundly the 
antigenicity of many of the putative proteins of HCV. 
We have previously shown that sera from blood donors 
infected with different HCV types show distinct differ- 
ences in the pattern of reactivity to a range of structural 
and non-structural proteins in two commercial immuno- 
blot assays for HCV antibody (Ortho RIBA and 
Innogenetics LI A; Chan et a/., 1991). In particular, no 
reactivity with C-100 and infrequent reactivity with 
C33c were observed in patients infected with HCV types 
2 and 3, presumably reflecting the high degree of 
sequence variability in the NS-3 and NS-4 regions of the 
genome. Reactivity, however, was always found with the 
core protein, which is consistent with the degree of 
sequence conservation in this region (Table 3). 

This then provides at least one explanation for the 
observation that blood donor screening with the original 
C-100-based immunoassay reduced the incidence but did 
not entirely prevent post-transfusional NANB hepatitis 
(Esteban et al., 1990; Japanese Red Cross Non-A, 
Non-B Hepatitis Research Group, 1991). The use of 
second generation tests that incorporate core proteins 
will undoubtedly increase the effectiveness of blood 
donor screening, although the most effective test for 
HCV infection would be an assay incorporating rep- 



resentative antigens from all three HCV types. Ina> 
an immunoblot assay that included polypeptides i 
sponding to the C-100 protein of HCV types 1, 2 an 
may serve to type infected individuals serologically^ 
virtue of the apparent type-specific serological reactiv^ 
to this variable protein. 
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Conservation of Hepatitis C Virus 5' Untranslated Sequences 
in Hepatocellular Carcinoma and the Surrounding Liver 

Deborah E. Sullivan and Michael A. Gerber 

Department of Pathology and Laboratory Medicine, Tulane University School of Medicine, New Orleans, Louisiana 70112 



Persistent infection by hepatitis C virus is a mqfor 
risk factor for the development of hepatocellular 
carcinoma, but the mechanism of hepatocarcino- 
genesis is unknown. To study the association of hepa- 
titis C virus with hepatocellular carcinoma, we se- 
quenced part of the 5' untranslated region of hepatitis 
C virus from the tumor tissue and the surrounding 
nontumorous liver of three patients with hepatocel- 
lular carcinoma. No sequence differences between 
tumor-derived and liver-derived hepatitis C virus iso- 
lates were detected. The conservation of the5' untrans- 
lated region of hepatitis C virus -not only in infected 
hepatocytes, but also in neoplastic cells -suggests that 
the regulatory elements at the 5' terminus of the viral 
genome play an important role in the pathobiology of 
hepatitis C virus. (Hepatology 1994;19:551-553.) 

Hepatitis C virus (HCV) is a major pathogenic agent of 
chronic hepatitis which often leads to cirrhosis and HCC 
(1). The virus contains a positive-stranded RNA genome 
of approximately 9,400 nucleotides and presumably 
replicates through formation of negative-stranded RNA 
intermediates (2). Several complete sequences of HCV 
have been reported, revealing a single large translational 
open reading frame preceded at the 5' end by a relatively 
long untranslated region (5' UTR) (3, 4). This 5' 
terminal region is 324 to 340 nucleotides long and 
represents the most highly conserved sequence among 
different HCV isolates (5). Recent studies suggest that 
the 5' UTR of HCV contains positive and negative 
translational control elements and may play a role in the 
pathobiology of chronic HCV infection (6). 

Many reports from around the world demonstrated a 
close association between chronic HCV infection and the 
development of HCC. The mechanism of malignant 
transformation, however, is not known. We and others 
recently demonstrated positive-stranded and negative- 
stranded HCV RNA sequences, both in chronically 
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infected liver tissue and HCC tissue (7-9), To further 
elucidate the pathobiology of HCV, we cloned and 
sequenced part of the 5' UTR of HCV from the tumor 
tissue and the surrounding nontumorous liver of three 
patients with UTR, 

PATIENTS AND METHODS 

Patients. Three male, HCV-positive patients (62, 52 and 74 
yr) with chronic hepatitis C, cirrhosis and HCC were selected 
for this study (patients 3, 10 and 12 of reference 7). HCV 
antibody (anti-HCV) was detected in the serum of one patient, 
and HCV RNA untranslated sequences were present in the 
liver and tumor tissues of all three patients. In our larger series 
of patients (7), the correlation between the presence of 
anti-HCV in serum and HCV sequences in liver was much 
better (p < 0.025) than in the three patients selected for study 
here. All HBV serological markers were negate The tissues 
were obtained from the liver explants at the time of orthotopic 
liver transplantation. 

Extraction of RNA from Liver and Tumor Tissues. Two 
pathologists selected tissue samples by means of histological 
examination of frozen sections to represent only liver or only 
HCC. Then the selected fresh frozen tissues (1 mg) were 
homogenized in a guanidine solution (4 mol/L guanidium 
thiocyanate, 25 mol/L sodium citrate [pH 7.0], 0.5% sarcosyl, 
0.1 mol/L 2-mercaptoethanol) with a glass homogenizer at 
4° C (10). Sodium acetate was added to yield a final concen- 
tration of 0.2 mol/L, and the lysate was extracted with phe- 
nol/chloroform (1:1). The aqueous phase was collected after 
centrifugation at 14,000 rpm for 10 min at 4° C and extracted 
with an equal volume of chloroform. We precipitated RNA 
from the aqueous phase by adding 1 vol of isopropanol and 
keeping the sample at - 70° C for at least 1 hr. RNA was 
collected by means of centrifugation at 14,000 rpm for 30 min 
at 4° C. The RNA pellet was washed with 70% ethanol, 
air-dried and then resuspended in 10 to 20 pi of sterile, 
nuclease-free water. RNA samples were aliquoted and stored at 
- 70° C until their use. 

Reverse Transcription-Poiymerase Chain Reaction* The 
sequences of the primers for the UTR were as follows: For the 
first polymerase chain reaction (PCR), the sense primer was 
5-CTGTGAGGAACTACTGTC-3' and the antisense primer 
was 5-CACTACTCGGCTAGCAGT-3'; for the second PCR, 
the sense primer was 5 -CACGCAGAAAGCGTCTAG-3' and 
the antisense primer was 5'-TTGATCGAAGAAAGGACCC*3'. 
The expected lengths of first* and second-round PCR products 
were 219 and 142 bases, respectively. The sequence of the 
oligomer probe used was 5'-AGTATGAGTGTCGTGCAGC- 
CTCCAGGA-3'. 
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HCV-1 -277 CACOCAOAAAQCQTCTAGCCATOQCQTTAQT ATGA6T QTCQT OCAOC 



Tu-12 

U-3 

Tu-3 

U-10 

Tu-10 



HCV-1 

U12 

Tu-12 

U-3 

Tu-3 

U-10 

Tu-10 



•230 CTCCAGGACCCCCCCTCCCOGGAGAGCCATAGTGQTCTGCGGAACCG 



HCV-1 -183 GTGAGTACACCQGAATTGCCAGGACGACCGGGTCCTTTCTT GGAT CAA 

LM2 C---G---T 

Tu-12 C - - - G - - - T 

U-3 

Tu-3 

U-10 

Tu-10 ........... 

Fig. 1. Nucleotide sequence of the 5' UTR of the HCV-1 prototype in comparison to sequences of HCV RNA from liver (Li) or tumors (Tu) 
oi tnree patients. 



We performed first-strand cDNAJ synthesis by mixing 500 
ng liver or tumor-derived total RNA with 500 ng (80 pmol) of 
the outer antisense primer, denaturing at 70° C for 10 min and 
annealing at room temperature (7). Ten units of avian 
myeloblastosis virus reverse transcriptase (Promega Corp., 
Madison, WI) was added and incubated at 42° C for 60 min in 
the presence of 36 units of RNasin (Ambion, Austin, TX), 

I mmol/L each of the four dNTPs (Promega Corp.), and PCR 
buffer (50 mmol/L KC1, 10 mmol/L Tris-HCl [pH 9.0], 1.5 
mmol/L MgCl a , 0.01% gelatin, 0.1% Triton X-100). ThecDNA 
product was boiled for 10 min and chilled on ice, and 10 jil was 
amplified in a total volume of 50 u.1 containing 250 ng (40 pmol) 
of each of the outer primers, 200 u.mol/L dNTPs, PCR buffer 
and 2.5 units of Taq polymerase (Promega Corp.). The reaction 
was performed in a programmable DNA thermal cycler 
(Ericomp, San Diego, CA) for 30 cycles consisting of denatur- 
ation at 94° C for 1 min, primer annealing at 55° C for 1 min 
and primer extension at 72° C for 1 min. Five microliters of 
PCR product from the first amplification was used as template 
for a second amplification of 30 cycles (same temperature 
profile as first amplification) with the inner primer pair. After 
the second amplification, 10 pi of the PCR product was 
analyzed by means of electrophoresis on a 2% agarose gel 
containing ethidium bromide and visualized with UV fluores- 
cence. Specificity of the PCR reaction was confirmed with 
Southern-blot analysis and hybridization to a 32 P-labeled 
HCV-UTRr-specific oligomer probe (see above) under stringent 
conditions. 

Cloning of PCR Product: Specific second-round PCR 
products corresponding to nucleotides -277 to -136 were 
gel-purified (11) and ligated directly into a T-vector prepared 
according to the method of March uk et al. (12). pBluescript SK 

II plasmid (Stratagene, La Jolla, CA) was digested with Smal 
(Promega Corp.) and incubated with Taq polymerase (1 
unit/jig pIasmid/20 u.1 volume) in PCR buffer (60 mmol/L KC1, 
10 mmol/L Tris-HCl [pH 9.0], 1.5 mmol/L MgCl 2 , 0.01% 
gelatin, 0.1% Triton X-100) containing 2 mmol/L dTTP 
(Promega Corp.) for 2 hr at 70° C. After phenol/chloroform 
(1:1) extraction and ethanol precipitation, 100 ng of T-vector 
and 12 ngofPCR product were ligated at 14° C overnight. The 
ligation mixture was dialyzed against sterile dH 2 0 for 1 hr in 
a mtcrodialyzer ( 13) with two dialysate changes; 1 u.1 was used 



to transform competent DH5-alpha Escherichia coli by means 
of electrpporation. We identified positive clones by plating on 
LB, ampicillin, and X-gal plates and screening of white colonies 
by means of PCR using HCV-UTR-specific primers. 

Positive clones were sequenced with Sequenase (U.S. Bio- 
chemical Corp., Cleveland, OH) and standard conditions for 
sequencing of double-stranded plasmid DNA. Sequencing 
reactions were analyzed on a 6% denaturing sequencing gel. 

Direct Sequencing of PCR Products. Second-round PCR 
products for .sequencing were purified by means of gel 
electrophoresis through low melting point agarose and then 
eluted (11), The specific band was excised, melted in 3 vol 
sterile distilled H 2 0 at 65° C for 10 min and frozen at - 70° C. 
After thawing, the mixture was centrifuged at 14,000 rpm for 
10 min to pellet the agarose and the supernatant containing 
the HCV PCR product was collected. 

The double-stranded PCR products were sequenced with 
Sequenase (U.S. Biochemical Corp.) under the conditions 
recommended by Casanova et al. (14). Specifically, 0.25 pmol 
of PCR product was denatured in the presence of 5 pmol PCR 
primer and reaction buffer (U.S. Biochemical Corp.) by boiling 
in 10 jil for 10 min. The mixture was immediately cooled at 
- 70° C for 15 sec, rapidly mixed with 5.5 u.1 labeling mix (U.S. 
Biochemical Corp.) and incubated at room temperature for 15 
sec. We carried out termination reactions by transferring 3.5 
u.1 of the labeling reaction to each of four wells of a microtiter 
plate containing 2.5 u.1 of the respective ddNTPs mixture (U.S. 
Biochemical Corp.) and incubating at 37° C for 2 min. The 
reaction was stopped with 4 u.1 of stop solution (U.S. Bio- 
chemical Corp.). Sequencing reactions were heat-denatured at 
80° C for 3 min and analyzed on a 6% denaturing sequencing 
gel. 

RESULTS AND DISCUSSION 

Through a combination of phylogenetic, thermody- 
namic and biochemical studies, Brown et al. derived a 
model of the secondary structure of the 5' UTR of HCV 
(15). It revealed several secondary structural elements 
such as hairpins (stem/loop), bubbles and bulges that 
may bind regulatory proteins (16). In addition, the 5' 
UTR sequence represents the most conserved region of 
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the HCV genome. Therefore we selected a portion of this 
region for sequence analysis and comparison of the HCV 
genomes isolated from HCC tissue and the surrounding 
liver of the same patients. Bukh and coworkers (17) 
described three domains of significant heterogeneity in 
the 5' UTR of HCV. The region sequenced here 
corresponds to nucleotides - 277 to - 136 and spans two 
of these domains. Nevertheless, we emphasize that only 
142 nucleotides of the 341 nucleotides of the 6' UTR 
were sequenced and that the remainder of the 5' leader 
should be examined. 

Clearly separable specimens of tumor and liver were 
available in the three cases selected. The 5' UTR of the 
extracted HCV RNA was reverse transcribed and the 
resulting cDNA was amplified by means of double PCR 
All cases contained amplifiable HCV 5' UTR sequences, 
as confirmed with Southern-blot analysis using a HCV- 
specific oligomer hybridization probe. The 142 base pair 
PCR products derived from the liver and tumor of case 
12 were each cloned into pBluescript II SK + and 
sequenced. Comparison of the nucleotide sequences 
demonstrated no differences between tumor- and liver- 
derived HCV isolates. When compared with that of the 
HCV-1 prototype (5), the nucleotide sequences were 97% 
homologous (Pig. 1) and 100% homologous to isolates 
Z4, Z6, and Z7, as reported by Bukh et al. (16). 

PCR products derived from cases 3 and 10 were 
sequenced directly. Again, we found no difference 
between tumor- and liver-derived HCV isolates. The 
nucleotide sequences of HCV isolates from both cases 
were 100% homologous to the HCV-1 prototype (Fig. 1).- 
Nucleotide sequence information from all HCV isolates 
was confirmed by means sequencing of the PCR 
products from three independent reverse transcrip- 
tion-PCR reactions. 

These findings suggest that mutations of the 5' UTR 
of HCV, which appears to contain msuor regulatory 
control elements, do not play a role in malignant 
transformation of hepatocytes. Furthermore, there is no 
evidence that the HCV genome is reverse-transcribed 
and integrated into the host cell genome (7, 8). In 
contrast, during chronic HBV infection HBV DNA 
undergoes m^jor alterations and is integrated into the 
hepatocyte genome during hepatocarcinogenesis (18). 
Furthermore, the transcriptional transactivator X 
protein and the truncated pre-S/S gene may play a role 
in malignant transformation in HBV infection (18). 

In a recent study, Paterlini et al. (8) sequenced a 
hypervariable region of the E2/NS1 gene of HCV from 
the serum, liver tissue and tumor tissue of two patients. 
The sequences obtained from each sample were different 
and demonstrated several mutations in the same pa- 
tient. The E2/NS1 region exhibits significant diversity 
among different viral isolates, even those of the same 
HCV strarn. In contrast, our findings indicate that the 5' 
UTR is highly conserved among HCV isolates recovered 
from tumor and liver tissues of the same and different 



6, 



SULLIVAN AND GERBER 553 

patients. The findings suggest that the regulatory 
elements of the 5' UTR may play an important role in 
the life qycle and persistence of HCV in hepatocytes and 
during hepatocarcinogenesis. 
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of five additional subtypes 

(genoty ping /line probe assay/taxonomy) 

LlEVEN STUYVER* t , WOUTER VAN ARNHEM*, ANN WYSEUR*, FRANCISCO HERNANDEZ*, ERIC DELAPORTE+, 
AND GEERT MAERTENS* 

♦Innogenetics, Industriepark 7, Box 4, B-9052 Ghent, Belgium; and *Institut National de la Sant6 et de la Recherche Medicate Unit6 13, 190 Boulevard 
MacDonald, 75019 Paris, France 



Communicated by Robert H. Purcell, June 1, 1994 

ABSTRACT Genotyping of hepatitis C virus-positive sera 
by means of a line probe assay indicated that <3% of European 
samples, but up to 30% of Gabonese sera, could not be 
classified as either la, lb, 2a, 2b, 3a, 3b, 4c, 5a, or 6a. Such 
samples were analyzed in the 5' untranslated region and in the 
nonstructural 5 (NS5) region. Classification based on phylo- 
genetic analysis of the commonly used 222 -bp-long NS5B 
region was possible for most but not all of the selected sera. 
Therefore, the core/envelope 1 region (579 bp) and a larger 
NS5B (340 bp) region were also analyzed. Only the phyloge- 
netic analysis of the 340-bp NS5B region of these newly 
identified and published isolates provided unambiguous clas- 
sification into types and subtypes. Furthermore, unequivocal 
evidence for four subtypes in type 2 and eight subtypes in type 
4 was provided. A specific recognition sequence in the 5' 
untranslated region was observed for every newly identified 
subtype. Based on 1830 pair-wise comparisons in NS5B, iso- 
lates belonging to the same subtype showed evolutionary dis- 
tances of <0.127 and isolates of the same type exhibited 
evolutionary distances of <0.328. These phylogenetic border 
distances can be conveniently used for classification of hepatitis 
C virus isolates into types and subtypes. 



Hepatitis C virus (HCV) is thought to be the causative agent 
of most non-A, non-B hepatitis cases. A very high number of 
HCV-infected patients develop chronic hepatitis, which of- 
ten results in liver cirrhosis and occasionally progresses to 
hepatocellular carcinoma (1). DNA complements of the com- 
plete RNA genome of HCV have been cloned (2-5) and show 
an organization comparable to those of the genomes of 
pestiviruses and flaviviruses (6). Within the HCV genus, a 
high degree of sequence heterogeneity exists. Four groups of 
complete genomes have been reported. HCV-1 (3), HC-J1 
(7), and HCV-H (8) belong to a group designated as type I (9), 
PT (10), or la (11, 12). The second group contains at least 
eight complete genomes: HCV-J (2), HCV-JK1 [GenBank 
accession no. (Acc.) X61596], HCV-China (Acc. L02836), 
HCV-T (13), HC-J4 (14), HCV-TA (15), HCV-BK (16), and 
HCV-JT (15), and was described as either type II (9)^K1 (10), 
or lb (11, 12). The third group contains only one prototype 
sequence, HC-J6 (4), described as type III (9), K2a (10), or 
2a (11, 12). Group four, called type III (9), type IV (17), type 
K2b (10), or type 2b (11, 12), is represented by HC-J8 (5). 
More groups could be identified based on homology calcu- 
lations in the nonstructural 5 (NS5), the core, and the 
envelope 1 (El) region but also based on variations in the 5' 
untranslated region (5' UR). These included type 3a (11, 12) 
or type IV (9) or type V (17); type 3b or type VI (17); type 4 
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(12, 18-21); type 5, or type V (9); and type 6 (19-21). A 
proposal for classification (11, 12) has now been widely 
accepted by scientists in the HCV variability field (22). We 
have previously reported a convenient system (12) for de- 
tection of the major types and subtypes of HCV based on 
genotype- specific variations in the 5' UR. Attempts have 
already been undertaken to recognize and classify the major 
genotypes into serotypes (23-26) but further research is 
needed before serotyping may become widely available, and 
serological subtyping may be difficult (26). 

HCV exists in vivo as a mixture of slightly different 
genomes, called quasispecies (27). The mutation rate was 
estimated to be 1.92 x 10~ 3 (H strain) and 1.44 x 10~ 3 
(HC-J4) base substitutions per site per year (14, 28). The 5' 
UR and the NS4B region are the most conserved, while the 
El and E2 regions show more variability (14). The evolu- 
tionary algorithm underlying phylogenetic analysis is based 
on the "neutral theory" (29), which claims that mutations 
occur randomly and are functionally almost neutral. The 
program dnadist of the phylogenetic analysis package 
phylip (30) computes a phylogenetic distance between two 
different nucleotide sequences, analyzing all positions, in- 
cluding those that apparently did not change. The obtained 
phylogenetic distance is represented by the sum of horizontal 
branch lengths from one isolate to another through the 
phylogenetic tree. 

In this study, we selected HCV-seropositive samples from 
Gabon, Cameroon, Belgium, and The Netherlands, which 
were subjected to genotype analysis by a line probe assay 
(LiPA). Aberrantly reactive samples were selected for se- 
quencing and phylogenetic analysis in the 5' UR, the core/El 
region, and the NS5B region. § The recognized types and 
subtypes (including newly identified genotypes 2d^nd 4e-4h) 
contained specific sequence motifs in the 5' UR. The results 
of the phylogenetic analysis of the core/El (C/El) and NS5B 
regions were comparable, but only results obtained from the 
340-bp NS5B region were completely conclusive. 

MATERIALS AND METHODS 

Serum Samples. Serum samples from Gabon (GB) and 
Cameroon (CAM) were collected previously (31), screened 
for HCV antibodies with Innotest HCV antibody (Ab) II, and 
confirmed by INNO-LIA HCV Ab II and III (Innogenetics, 
Antwerp, Belgium). HCV antibody-confirmed samples were 
subsequently genotyped by LiPA (12). Serum samples from 



Abbreviations: HCV, hepautis C virus; 5' UR, 5' untranslated 
region; LiPA, line probe assay; El, envelope 1; NS, nonstructural; 
Acc, GenBank accession no. 
f To whom reprint requests should be addressed. 
§The sequences reported in this paper have been deposited in the 
GenBank data base (accession nos. L29574-L29635). 
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Belgium were obtained from blood donors (BE90, BE%, 
BE97, and BE100), which were confirmed anti-HCV positive 
by INNO-LIA HCV Ab III. Serum BE95 was obtained from 
a child with leukemia and a history of blood transfusion. 
Serum samples from The Netherlands (NE91, NE92, and 
NE93) were obtained from chronic HCV patients. 

PCR, LiPA, Cloning, and Sequencing. RNA isolation, 
cDNA synthesis, PCR, cloning, and LiPA genotyping using 
biotinylated 5' UR amplification products were performed as 
described (12, 24). The core/El and NS5B PCR products 
were used for direct sequencing and cloning. The 5' UR 
sequences of NE91, NE92, and NE93 were obtained from a 
study in which LiPA typing was confirmed by sequencing 
(32). The sequences of the universal 5' UR primers and of 
HCPr54 have been described (12, 24). The sequence of 
primer HCPr52 (+) was 5 '-TTGGGTAAGGTC ATCG AT- 
ACCCT-3'. The core/El region was amplified with primers 
HCPr52-54, and the NS5B region was amplified as described 
(10). 

Phylogenetic Analysis. Previously published sequences 
were taken from the EMBL/GenBank data base. Corre- 
sponding accession numbers are given in the Introduction or 
in the references in the phylogenetic trees. Alignments were 
created using the program hcv align (F.H., unpublished 
data). Sequences were presented in a sequential format to the 
Phylogeny Inference Package (phylip) version 3.5c (30). 
Distance matrices were produced by dnadist using the 
Kimura two-parameter setting and further analyzed in 
neighbor, using the neighbor-joining setting. The program 
drawgram was used to create a graphic output. 

RESULTS 

Serum Selection. Using the LiPA, «500 anti-HCV positive 
serum samples were genotyped. Genotypes 4 and 5 were 
detected in up to 60% of the Gabonese samples, but these 
so-called " African genotypes" were also detected in 5-15% 
of Benelux samples. Based on these LiPA results, <3% of the 
Benelux, but 12 of 39 (31%) of the Gabonese samples could 
not be typed or subtyped. However, the presence of HCV 
RNA was confirmed by means of the LiPA system. The 5' 
UR sequence of these exceptional cases was determined; for 
some selected samples, the core/El and NS5B regions were 
analyzed. 

The 5' UR. Fig. 1 shows the 5' UR sequences of 16 of the 
selected isolates compared with prototype sequences. Isolate 
BE90 did not hybridize with one of the typing probes because 
of variation at positions -159 and -127. Serum NE92 only 
reacted with the type 2-specific probes in the LiPA but could 
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HCV-1 la GTCGTGC TGCCAGGACGACCG 

HCV-J 1b --T 

8E90 1b T 

HC-J6 2a A G---A---T- 

HC-J8 2b A- -A--G-A-A---T- 

KE91 2b A G-A-A---T- 

S83 2c A G---A---T- 

NE92 2d A A---T- 

BR56 3a C--TG--GT 

NE93 3a C--TG--GT 

HCV-TR 3b C---G---T 

Z4 4a --T--A- C 

21 4b --T--A- C---G—T 

26 4c --7--A- C--G---T 

GB48 4c --T--A- C---G---T 

GB116 4c --T--A- C---G---T 

GB358 4c --T--A- C- — G---T 

DK13 4d --T--A- C---G---T 

GB809 4e --T--A- C---GA--T 

CAM600 4e --T--A- C---GA--T 

CAM736 4e --T--A- C---GA--T 

CAMG22 4f --T--A G 

CAMG27 4f --T--A G---T 

GB549 4g --T C---G 

GB438 4h A- C---G---T 

SA1 5a AA G---T 

BE95 5a AA G---T 

BE96 5a AA G 

HK2 6 A T 



not be subtyped on the 2a and 2b probes and was therefore 
selected for further analysis. BE96 could not be typed 
because of variations at positions -163 and -122. In total, 10 
newly identified type 4-related sequences are shown. Only 
GB549, GB809, CAM600, CAM736, CAMG22, and 
CAMG27 could not be recognized by means of the LiPA. 
GB438 was recognized as type 4 but is different from other 
type 4 sera due to a variation at position -238. The three 
other Gabonese (GB48, GB116, and GB358) isolates were 
included for reasons of comparison. 

The El and NS5B Regions. Primer set HCPr52-HCPr54 
permitted the PCR amplification of DNA fragments from the 
core/El region of 12 sera. Such a PCR fragment includes 64 
codons of the core 3' terminus and 127 codons of El with a 
total length of 579 bp. With the NS5 primer set (10), 340-bp- 
long PCR fragments were obtained from 14 different sera. 
PCR fragments were subjected to either cloning, direct 
sequencing, or both. Fig. 2 shows the alignment of the 
deduced amino acid sequences of the recombinant core/El 
clones. The previously indicated variable regions VI- V5 and 
the hydrophobic domain (24) are depicted. The eight cysteine 
residues and the four N-glycosylation sites are conserved. 

Phylogenetic Analysis. Four different groups of sequences 
were analyzed. Group one contained 71 core/El sequences 
of 579 bp (2485 pair- wise comparisons), group two contained 
134 El sequences of 384 bp (8911 comparisons), group three 
contained 61 NS5B sequences of 340 bp (1830 comparisons), 
and group four contained 115 NS5B sequences of 222 bp 
(6555 comparisons). The border values of phylogenetic dis- 
tances of each group are given in Table 1. Only the analysis 
of the 340-bp NS5B region resulted in three nonoverlapping 
distance ranges. Phylogenetic trees were created from the 
DNA distance matrices (Fig. 3 and refs. 33 and 34). Phylo- 
genetic distances obtained from two different regions allowed 
us to classify isolate NE92 as subtype 2d, serum GB809 and 
CAM600 as subtype 4e, CAMG22 as subtype 4f, GB549 as 
subtype 4g, and GB438 as subtype 4h. When comparing 
direct sequencing results with sequences obtained from re- 
combinant clones, the phylogenetic distance between both 
was always less than the distance between two different 
isolates belonging to the same subtype. This was demon- 
strated for most of the investigated sera in the NS5B region 
and in the El region. Only one exception in serum GB809 was 
found, where clone GB809.4 was significantly different from 
the direct sequence and from clone GB809.2. The mean 
phylogenetic distance of 0.2340 provided evidence that the 
two core/El clones were neither derived from the same 
quasispecies nor from the same subtype, but rather from 
different type 4 subtypes; serum GB809 is therefore coin- 
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Fig. 2. Deduced amino acid sequence alignment of the HCV El region [positions 192-319 (3)]. V1-V5, variable regions 1-5 (24). 



fected. Based on the direct sequence of the El region, the 
newly identified subtype 4e was predominant and related 
to the CAM600 El sequences (distance of 0.0637; Fig. 3A). 
As the sequences of the two recombinant NS5 clones 
(GB809_3_1 and GB809J.2) are related to the direct se- 
quence of CAM600 (mean distance of 0.0594; Fig. 3C), these 
clones represent subtype 4e of the coinfection. 

DISCUSSION 

The recently described LiPA technology (12) enabled us to 
rapidly genotype serum samples obtained from different 
geographical regions. Sequences of at least 60 5' UR PCR 
fragments further supported the reliability of LiPA genotyp- 
ing (32). HCV genotypes 4 and 5 have hitherto been regarded 
as the African genotypes. Type 4 has been detected in Zaire 
(21), in Burundi and Gabon (12), in Egypt (18), and also in 
Cameroon (this work), whereas genotype 5 was only reported 
from South Africa (9, 21). Here, we demonstrate consider- 
able prevalences (5-15%) of genotypes 4 and 5 in Europe as 
well, whereas they could not be found among Brazilian or 
Japanese sera (ref. 35; L.S., unpublished). Genotype 6 has 
only been reported from Hong Kong (19, 21). 

The use of sequences obtained either from clones or by 
direct sequencing for subsequent phylogenetic analysis is 
currently debated. Several studies have employed either 
method (9, 11, 19, 21), while both sequencing strategies were 
compared in this study. It could be demonstrated that minor 
sequence variations, such as those between clones and direct 
sequences, did not influence the outcome of phylogenetic 
analysis or classification. However, the potential for misin- 
terpretation from sequencing a single clone from two non- 
contiguous regions should be acknowledged. 

Isolate NE92 was compared in the 5' UR, the core/El 
region, and NS5 region with type 2a, 2b, and 2c isolates (Fig. 

Table 1. Molecular evolutionary distances 



3. This strain was significantly different from type 2a and 2b 
prototype sequences, but also from type 2c isolate S83 (21) in 
the El region and the type 2c isolates Arg6, Arg8, 110, and 
T983 (19) in the NS5 region. NE92 can therefore be classified 
into a newly identified subtype, 2d. Whether type 2c isolates 
described in El (21) and NS5 (19) belong to the same subtype 
has not yet been demonstrated. 

For most subtypes of type 4, specific motifs can again be 
detected in the 5' UR. A basic type 4 motif is located between 
nucleotide positions -240 and -230, and subtype-specific 
motifs can be observed between positions -170 and -155 
and between -125 and -115. From a group of four isolates 
of subtype 4c, sequences were obtained from the 5' UR, 
core/El region, and NS5B region, and all sequences clus- 
tered together after phylogenetic analysis of each region. 
Identical subtype 4e motifs (Fig. 1) were found in three 
different isolates collected from Gabon and Cameroon 
(GB809, CAM600, and CAM736). According to the phylo- 
genetic analysis of the El and NS5B region, they belong to 
the same subtype (Fig. 3). Similarly, a very specific 5' UR 
signature sequence for isolates CAMG22 and CAMG27 in 
subtype 4f was also found. Subtype-specific motifs were also 
discovered for subtypes 4g and 4h, but it cannot be predicted 
whether these signature sequences will be maintained in 
other isolates belonging to these subtypes. 

Because of the large number of complete El sequences and 
the priority of publication, the proposal for type 4 subtype 
nomenclature of Bukh et ai (21) was followed. One of the 
subtypes of the double-infected serum sample GB809 clus- 
tered together with Z4 (subtype 4a) (Fig. 3A). Several type 4c 
isolates (related to Z6 and Z7) were also discovered in this 
study, but we found no isolates related to Zl (subtype 4b) or 
DK13 (subtype 4d). Six other isolates were included into 
newly identified subtypes 4e (GB809 and CAM600), 4f 
(CAMG22 and CAMG27), 4g (GB549), and 4h (GB438). A 



Region 


Core/El, 579 bp 


El, 384 bp 


NS5B, 340 bp 


NS5B, 222 bp 


Isolates 

Subtypes 

Types 


0.0017-0.1347 
(0.0750 ± 0.0245) 

0.1330-0.3794 
(0.2786 ± 0.0363) 

0.3479-0.6306 
(0.4703 ± 0.0525) 


0.0026-0.2031 
(0.0969 ± 0.0289) 

0.1645-0.4869 
(0.3761 ± 0.0433) 

0.4309-0.9561 
(0.6308 ± 0.0928) 


0.0003-0.1151 
(0.0637 ± 0.0229) 

0.1384-0.2977 
(0.2219 ± 0.0341) 

0.3581-0.6670 
(0.4994 ± 0.0495) 


0.0000-0.1323 
(0.0607 ± 0.0205) 

0.117-0.3538 
(0.2391 ± 0.0399) 

0.3457-0.7471 
(0.5295 ± 0.0627) 



Figures created by the phylip program dnadist are expressea as minimum iu inaAimum ya^ia^ ^ ^ h * i^^^^ 
distances for isolates belonging to the same subtype (Isolates), to different subtypes of the same type (Subtypes), and to 
different types (Types) are given. 



Biochemistry: Stuyver et al 



Proa Natl Acad Set USA 91 (1994) 10137 




o> o ■ 

5--° 
cr c 




tl T L 



> 
"5b 

u 

OJ 

X 

E 



~.2 5 



« O 3 
c C X 

.2 Sf * 

us OS « 
- U J= 



c c ■ 




o 
c 

2 = 8 

JZ v <u 
— JC 3 

§1 §1 

cr « • 

CD o 



w do ; 



5 c 



Si* si stflil e4 1 



CQ 0 

«z3 



.♦v ?r =*= S^Ei^S? r^rts r 



4) *= 

cert JJ'r 
"5 ^ o * 

£gf 8. 

oojn £ as 



— j — *~ ~, 
£§?S J K 

* en oi 




-.C — — — - . m ~ 



10138 Biochemistry: Stuyver et al. 



Proc. NatL Acad. ScL USA 91 (1994) 



subtype 4a classification was also assigned to isolates origi- 
nating from Egypt (18, 19), but it is not clear if EG-29, EG-33, 
and EG-21 [only partial core sequences published (18)] be- 
long to the same subtype as EG-17, EG-19, and EG-7 [only 
NS5B sequences published (19)]. Phylogenetic analysis in- 
dicated that the EG- 19 and EG-7 isolates possibly belong to 
another type 4 subtype (Fig. 3B). 

The relationship between isolates, subtypes, and types was 
expressed by means of the matrix values obtained with the 
dnadist program (Table 1), rather than by percentage of 
homology (see also refs. 11 and 21). This study allowed us to 
compare the molecular evolutionary distances in two differ- 
ent regions of the HCV genome, as is currently recommended 
(22). The results obtained in one region generally supported 
those obtained in other regions. The use of the El region is 
recommended because it provides information with respect 
to the viral phenotype. However, nonoverlapping evolution- 
ary distances for isolates, subtypes, and types only exist for 
the 340-bp NS5B region. This rinding is contradictory to the 
previously reported phylogenetic analysis, where the dis- 
tances of the 222-bp NS5 regions behaved as three nonover- 
lapping normal distributions (19). Phylogenetic border dis- 
tances for the 340-bp NS5B region can be used for classifi- 
cation of newly obtained sequences (Table 1). These values 
may be subject to changes as more sequence data become 
available. At this moment, intermediate values are set at 
0.127 for isolates/subtypes and 0.328 for subtypes/types. 

In conclusion, we provide evidence for the existence of at 
least eight subtypes of type 4 and four subtypes of type 2 and 
show that, in addition to the El region, the phylogenetic 
analysis should be performed by using nucleotide sequences 
of at least 340 bp in NS5B. 
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